Statistical Extraction and Visualization of Topics in the Qur'an Corpus

نویسندگان

  • Maysum H. Panju
  • Kais Dukes
چکیده

Unsupervised machine learning techniques are described and applied on the mildly preprocessed Arabic text of the Holy Qur’an, with promising results. Topic modelling based on nonnegative matrix factorization was used to successfully extract meaningful topics underlying the set of 6236 verses in the corpus. Data visualization using t-SNE dimensionality reduction correctly grouped verses of the Holy Qur’an into clusters based on theme and word usage. This accessible paper begins with an introductory view of machine learning, and includes motivating descriptions of the implemented techniques before presenting a summary of findings. A graphical display combining the results of topic modelling and data visualization demonstrates the consistency of the studied models.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Interpretation of the Verses of the Prophet`s Wives (with an Emphasis on New Objections)

The holy Qur'an has not spared of any subject having to do with human guidance in the path of ascension and growth. Some opponents, however, have claimed some of the Quranic discourses as being unnecessary and ineffective in the field. As they claim, the discussion of these topics may be taken as a sign against originality of the Qur'an. Among these are the verses on the marriage of Prophet Muh...

متن کامل

A semi-supervised learning framework for biomedical event extraction based on hidden topics

OBJECTIVES Scientists have devoted decades of efforts to understanding the interaction between proteins or RNA production. The information might empower the current knowledge on drug reactions or the development of certain diseases. Nevertheless, due to the lack of explicit structure, literature in life science, one of the most important sources of this information, prevents computer-based syst...

متن کامل

Applications of Topics Models to Analysis of Disaster-Related Twitter Data

The use of microblogging (using tools like Twitter and SMS messaging) during disasters offers a valuable source of information for disaster response agencies, as it often provides critical up-to-date and on-location updates about an unfolding crisis. This precipitates an interest in robust processing and visualization tools. We explore the use of Topics models for analysis of disaster-related T...

متن کامل

Monotheism in Help-Asking and its Educational Effects

From the point of view of the Holy Quran, monotheism as the most basic pillar of faith has different aspects of which the most important one is monotheism in recourse. Tawhid in help-asking along with Tawhid in worship is one of the important topics in the field of Qur'anic research on which various verses have been revealed and studied by commentators. In this article, after giving a terminolo...

متن کامل

Visualization of Text Document Corpus

From the automated text processing point of view, natural language is very redundant in the sense that many different words share a common or similar meaning. For computer this can be hard to understand without some background knowledge. Latent Semantic Indexing (LSI) is a technique that helps in extracting some of this background knowledge from corpus of text documents. This can be also viewed...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014